Graph Mining under Linguistic Constraints for Exploring Large Texts

نویسندگان

  • Solen Quiniou
  • Peggy Cellier
  • Thierry Charnois
  • Dominique Legallois
چکیده

In this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to Hoey’s linguistic model which allows the selection and the binding of adjacent and non-adjacent sentences. The main contribution of our work consists in proposing a method based on both Hoey’s linguistic model and a special graph mining technique, called CoHoP mining, to extract coherent sub-parts of the graph representation of the text. We have conducted some experiments on several English texts showing the interest of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fouille de graphes sous contraintes linguistiques pour l'exploration de grands textes (Graph Mining Under Linguistic Constraints to Explore Large Texts) [in French]

Graph Mining Under Linguistic Constraints to Explore Large Texts In this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to the Hoey linguistic model which allows the selection and the binding of sentences in the graph. Our contribution relates to using graph mining techniques ...

متن کامل

Transliteration Mining Using Large Training and Test Sets

Much previous work on Transliteration Mining (TM) was conducted on short parallel snippets using limited training data, and successful methods tended to favor recall. For such methods, increasing training data may impact precision and application on large comparable texts may impact precision and recall. We adapt a state-of-the-art TM technique with the best reported scores on the ACL 2010 NEWS...

متن کامل

Learning from Heterogeneous Genomic Data

Mining patterns under many kinds of constraints is a key point to successfully get new knowledge. In this paper, we propose an efficient new algorithm Music-dfs which soundly and completely mines patterns with various constraints from large data and takes into account external data represented by several heterogeneous datasets. Constraints are freely built of a large set of primitives and enabl...

متن کامل

Shallow vs. Deep Techniques for Handling Linguistic Constraints and Optimisations

An important aspect of many nlg systems is ensuring that all generated texts obey linguistic constraints and are (near-)optimal under linguistic quality measures. Where they are possible, deep techniques can automate the enforcement of linguistic constraints and optimisations. In contrast, shallow techniques require developers to explicitly enforce constraints and optimisations. Deep techniques...

متن کامل

Representation of texts as complex networks: a mesoscopic approach

Texts are complex structures emerging from an intricate system consisting of syntactical constraints and semantical relationships. While the complete modeling of such structures is impractical owing to the high level of complexity inherent to linguistic constructions, under a limited domain, certain tasks can still be performed. Recently, statistical techniques aiming at analysis of texts, refe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013